Why has (reasonably accurate) Automatic Speech Recognition been so hard to achieve?
نویسندگان
چکیده
It has now been over 35 years since hidden Markov Models were first applied to the problem of speech recognition ([2], [7]). Moreover, it has now been over 20 years since the speech recognition community seemed to collectively adopt the HMM paradigm as the most useful general approach to the fundamental problem of modeling speech. Perhaps a key turning point in this regard was Kai-Fu Lee’s thesis work [8], in which he clearly explained how to train an HMM-based system and then successfully applied a series of variations on the HMM theme to the Resource Management task, which was defined by DARPA and where the results were publicly evaluated by NIST. This is not to say that there have not been critiques of the HMM as a model of speech, nor that there have not been alternatives proposed and even explored at some length. One thinks of segmental models of various sorts ([17], [12]), and more recently, of the use of graphical models ([3]). Nonetheless, we think it is fair to say that it is still true in 2010 that the HMM remains the consensus model of choice for speech recognition, and that it lies at the heart of both commercially available products and contemporary research systems. However, in spite of the great success of the HMM paradigm, all is not well in the Land of Speech Recognition. Machine error rates on natural speech (e.g. conversational material found in Switchboard or Fisher data) are still very high (around 15% [4]), compared to what is achievable by
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملسیستم برچسب گذاری اجزای واژگانی کلام در زبان فارسی
Abstract: Part-Of-Speech (POS) tagging is essential work for many models and methods in other areas in natural language processing such as machine translation, spell checker, text-to-speech, automatic speech recognition, etc. So far, high accurate POS taggers have been created in many languages. In this paper, we focus on POS tagging in the Persian language. Because of problems in Persian POS t...
متن کاملDesigning and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods
For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...
متن کاملOff-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model
In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...
متن کاملA Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation
Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1003.0206 شماره
صفحات -
تاریخ انتشار 2010